{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure AI Document Intelligence"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Azure AI Document Intelligence (formerly known as Azure Form Recognizer) is machine-learning \n",
    "based service that extracts text (including handwriting), tables or key-value-pairs from\n",
    "scanned documents or images.\n",
    "\n",
    "This current implementation of a loader using Document Intelligence is able to incorporate content page-wise and turn it into LangChain documents.\n",
    "\n",
    "Document Intelligence supports PDF, JPEG, PNG, BMP, or TIFF.\n",
    "\n",
    "Further documentation is available at https://aka.ms/doc-intelligence.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install langchain langchain-community azure-ai-documentintelligence -q"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first example uses a local file which will be sent to Azure AI Document Intelligence."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the initialized document analysis client, we can proceed to create an instance of the DocumentIntelligenceLoader:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import AzureAIDocumentIntelligenceLoader\n",
    "\n",
    "file_path = \"<filepath>\"\n",
    "endpoint = \"<endpoint>\"\n",
    "key = \"<key>\"\n",
    "loader = AzureAIDocumentIntelligenceLoader(\n",
    "    api_endpoint=endpoint, api_key=key, file_path=file_path, api_model=\"prebuilt-layout\"\n",
    ")\n",
    "\n",
    "documents = loader.load()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default output contains one LangChain document with markdown format content: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2\n",
    "The input file can also be URL path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "url_path = \"<url>\"\n",
    "loader = AzureAIDocumentIntelligenceLoader(\n",
    "    api_endpoint=endpoint, api_key=key, url_path=url_path, api_model=\"prebuilt-layout\"\n",
    ")\n",
    "\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
